Ambient Sound Provides Supervision for Visual Learning
نویسندگان
چکیده
The sound of crashing waves, the roar of fast-moving cars – sound conveys important information about the objects in our surroundings. In this work, we show that ambient sounds can be used as a supervisory signal for learning visual models. To demonstrate this, we train a convolutional neural network to predict a statistical summary of the sound associated with a video frame. We show that, through this process, the network learns a representation that conveys information about objects and scenes. We evaluate this representation on several recognition tasks, finding that its performance is comparable to that of other state-of-the-art unsupervised learning methods. Finally, we show through visualizations that the network learns units that are selective to objects that are often associated with characteristic sounds.
منابع مشابه
Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning
The sound of crashing waves, the roar of fastmoving cars – sound conveys important information about the objects in our surroundings. In this work, we show that ambient sounds can be used as a supervisory signal for learning visual models. To demonstrate this, we train a convolutional neural network to predict a statistical summary of the sound associated with a video frame. We show that, throu...
متن کاملLearning to Localize Sound Source in Visual Scenes
Visual events are usually accompanied by sounds in our daily lives. We pose the question: Can the machine learn the correspondence between visual scene and the sound, and localize the sound source only by observing sound and visual scene pairs like human? In this paper, we propose a novel unsupervised algorithm to address the problem of localizing the sound source in visual scenes. A two-stream...
متن کاملQuestion Answering through Transfer Learning from Large Fine-grained Supervision Data
We show that the task of question answering (QA) can significantly benefit from the transfer learning of models trained on a different large, fine-grained QA dataset. We achieve the state of the art in two well-studied QA datasets, WikiQA and SemEval-2016 (Task 3A), through a basic transfer learning technique from SQuAD. For WikiQA, our model outperforms the previous best model by more than 8%....
متن کاملEvaluating non-speech sound visualizations for the deaf
Sounds such as co-workers chatting nearby or a dripping faucet help us maintain awareness of and respond to our surroundings. Without a tool that communicates ambient sounds in a nonauditory manner, maintaining this awareness is difficult for people who are deaf. We present an iterative investigation of peripheral, visual displays of ambient sounds. Our major contributions are: (1) a rich under...
متن کاملOil spill detection using in Sentinel-1 satellite images based on Deep learning concepts
Awareness of the marine area is very important for crisis management in the event of an accident. Oil spills are one of the main threats to the marine and coastal environments and seriously affect the marine ecosystem and cause political and environmental concerns because it seriously affects the fragile marine and coastal ecosystem. The rate of discharge of pollutants and its related effects o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016